feat(tools): add backward graph generation and validation tools#711
Open
Dayuxiaoshui wants to merge 5 commits into
Open
feat(tools): add backward graph generation and validation tools#711Dayuxiaoshui wants to merge 5 commits into
Dayuxiaoshui wants to merge 5 commits into
Conversation
|
Thanks for your contribution! |
This commit introduces backward graph generation pipeline integrated with GraphNet's test_compiler framework. Changes: - graph_net/torch/extractor.py: add try/except for capture_sparse_compute to support PyTorch versions where the config does not exist. - graph_net/torch/sample_pass/backward_graph_extractor.py: - switch module from train() to eval() to avoid dropout/BN side effects - clone forward inputs with detach().clone() to avoid inplace modification - add _is_pure_shape_graph() to skip subgraphs with only shape ops - tools/backward_graph_test.py: - batch backward FX Graph generation via aot_autograd - integrated test_compiler validation with auto-generated weight_meta.py - default GRAPH_NET_FLUCTUATION_DETECT_THRESHOLD=0.5 and trials=10 - tools/backward_kernel_dedup.py: - Triton kernel dedup analysis for backward graphs
Xreki
reviewed
May 18, 2026
Collaborator
Xreki
left a comment
There was a problem hiding this comment.
这个PR修复了哪些类型样本的反向图生成问题,需要举例在PR描述里面说明。应用PR后反向图生成成功率变化的数据,也需要写到PR描述里面。
| self.model_path, use_dummy_inputs=False, device=self.device | ||
| ) | ||
| module.train() | ||
| module.eval() |
Contributor
Author
There was a problem hiding this comment.
model.eval() 不会禁用梯度计算,只有 torch.no_grad() / torch.inference_mode() 才会。eval 仅改变特定层的前向行为(dropout → identity,BatchNorm → 用 running stats 而非 batch stats),反向传播完全正常。而且使用 eval 模式反而更好
| self.model_path, use_dummy_inputs=False, device=self.device | ||
| ) | ||
| module.train() | ||
| module.eval() |
| module.train() | ||
| module.eval() | ||
|
|
||
| if self._is_pure_shape_graph(module): |
Contributor
Author
There was a problem hiding this comment.
同意,已删除。纯形状子图(只有 view/reshape/transpose 等)在执行 backward 捕获时会自然地因为输出 tensor 无可求导而返回空,不需要额外预处理跳过。
| @@ -0,0 +1,538 @@ | |||
| #!/usr/bin/env python3 | |||
Collaborator
There was a problem hiding this comment.
Contributor
Author
There was a problem hiding this comment.
已删除 tools/backward_graph_test.py。
|
|
||
|
|
||
| def main(): | ||
| parser = argparse.ArgumentParser(description="Backward kernel dedup analysis.") |
Collaborator
There was a problem hiding this comment.
这个代码是什么反向Kernel去重?按照model.py的graph_hash.txt去重吗?这也不需要额外写代码,使用已有代码即可实现。
…one tools - Remove _is_pure_shape_graph() from backward_graph_extractor.py per reviewer feedback (incomplete op whitelist, not maintainable) - Remove tools/backward_graph_test.py (use existing shell script graph_net/test/backward_graph_extractor.sh for batch processing) - Remove tools/backward_kernel_dedup.py (use existing graph_hash.txt based dedup in graph_net/tools/deduplicated.py)
…tent Add `kernel_dedup.py` and wire it as a `dedup` subcommand under `tools.triton_kernel_extractor`. This performs kernel-level dedup by hashing normalized Triton kernel source (triton_poi_fused_xxx.py), which is complementary to the existing graph-level dedup via graph_hash.txt. Signed-off-by: Dayuxiaoshui <792179245@qq.com>
Signed-off-by: Dayuxiaoshui <792179245@qq.com>
- test_compiler: handle list/tuple outputs from backward graphs recursively in _align_output_device and output wrapping logic - extractor: generate graph_hash.txt from model.py content when saving Signed-off-by: Dayuxiaoshui <792179245@qq.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Overview
This PR fixes 4 critical issues in the
backward_graph_extractor.pypipeline for generating backward computational graphs, adds akernel_dedup.pytool for Triton kernel-level deduplication, and improvestest_compiler.pycompatibility with list-typed outputs from backward graphs.Types of Samples Fixed
Issue 1: BatchNorm subgraphs crashing during backward graph generation
Affected samples:
ultralytics/yolov6l_start2_end8_0,ultralytics/yolov9e-seg, and others containing BatchNorm layers.Root cause: The original implementation uses
module.train()mode, causing BatchNorm'srunning_mean/running_varto haverequires_grad=Truewhen passed toaot_module_simplified. However,_native_batch_norm_legit_no_trainingdoes not support gradient computation w.r.t.running_mean:Fix: Switch to
module.eval()mode and parseweight_meta.pyoriginal_nameto identifyrunning_mean/running_var/num_batches_tracked, excluding them fromrequires_grad.Issue 2: Input tensors corrupted by inplace operations
Affected samples: All backward graph generation.
Root cause: The original code reuses raw input tensors directly. Inplace ops (e.g.,
add_) mutate leaf tensors, causing gradient computation anomalies.Fix: Apply
detach().clone()to all input tensors.Issue 3: Backward graph list outputs unsupported by test_compiler
Affected samples: Backward graphs returning
[tensor], e.g.,mmpose/LiteHRNet-18_start2_end6_0.Root cause: Backward graphs output
[tensor]lists. test_compiler's_align_output_deviceandtorch.equalcomparison functions only handle Tensor, crashing on list types:Fix: Add recursive handling of nested list/tuple structures in test_compiler's output alignment and comparison functions.
Issue 4: Missing graph_hash.txt prevents kernel extraction
Affected: All backward graph samples — 0 kernels extracted after successful compilation.
Root cause:
GraphExtractordoes not generategraph_hash.txtwhen saving models.triton_kernel_extractorrequiresoriginal_graph/graph_hash.txtto trigger extraction.Fix:
GraphExtractornow computes SHA256 of model.py and writesgraph_hash.txtautomatically.Success Rate: Before vs. After
Before this PR (original backward_graph_extractor):
After this PR:
87.5% of fusible failures are due to output tensors without
requires_grad(e.g., int64 indices/masks), a structural characteristic of fusible decomposition, not a code bug.test_compiler Verification
Before: test_compiler crashes on backward graphs (missing weight_meta + list output). After:
Zero false positives: No "Environment fluctuation detected" events.
Triton Kernel Dedup Tool
New
tools/triton_kernel_extractor/kernel_dedup.py(invoked viadedupsubcommand). Performs kernel-level dedup by hashing Triton source code content, complementary to graph-levelgraph_hash.txtdedup:Changed Files
graph_net/torch/sample_pass/backward_graph_extractor.pymodule.eval()+ BN param filtering +detach().clone()graph_net/torch/extractor.pygraph_hash.txton savegraph_net_bench/torch/test_compiler.pytools/triton_kernel_extractor/kernel_dedup.pytools/triton_kernel_extractor/__main__.pydedupsubcommand